The aim of the study is to provide an explanation for the factors that most influence the differences in wage levels between Polish powiats (equivalent to counties). This study investigates regional wage disparities in Poland by applying machine learning models enhanced by Explanatory Model Analysis techniques. Using powiat-level data from the Local Data Bank (Pol. Bank Danych Lokalnych – BDL) for 2010 and 2023, a neural network framework was developed to predict wage levels based on economic, demographic, infrastructural and environmental variables. To interpret the model, we employed the Variable Importance over Permutation (VIP) and SHapley Additive exPlanations (SHAP) approaches, which provide insights into both the global feature importance and the local contributions of individual variables. The results indicate that the share of the productive population, unemployment rates and social vulnerability remain key determinants of wage differences, although their relative influence shifts significantly over time. The SHAP analysis demonstrates how regional contexts such as the Jelenia Góra and Wrocław powiats exhibit distinct factor dynamics, with demographic and infrastructural variables playing varying roles across the studied years. The findings highlight the potential of combining machine learning with explainability methods to uncover complex, nonlinear determinants of wages, offering a more transparent analytical basis for understanding evolving regional disparities.
deep learning, machine learning, explanatory model analysis, wage disparities
C15, C45, O150
Adamczyk, A., Tokarski, T., & Włodarczyk, R. W. (2009). Regional Wage Differences in Poland. Gospodarka Narodowa. The Polish Journal of Economics, 234(9), 87–108. https://doi.org/10.33119/GN/101248.
Apley, D. W., & Zhu, J. (2020). Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 82(4), 1059–1086. https://doi.org/10.1111/rssb.12377.
Bartosik, K., & Mycielski, J. (2015). Dynamika płac a długotrwałe bezrobocie w polskiej gospodarce (INE PAN Working Paper Series no. 38). https://www.inepan.pl/images/pliki/Working_Papers/WorkingPapers_38.pdf.
Biecek, P., & Burzykowski, T. (2021). Explanatory Model Analysis: Explore, Explain, and Examine Predictive Models. CRC Press.
Blanchflower, D. G., & Oswald, A. J. (1990). The Wage Curve. The Scandinavian Journal of Economics, 92(2), 215–235. https://doi.org/10.2307/3440026.
Bolińska, M., & Gomółka, A. (2018). Determinanty przestrzennego zróżnicowania płac w obwodach Ukrainy Zachodniej w latach 2004–2015. Modern Management Review, 23, 31–44. https://doi.prz.edu.pl/pl/publ/zim/341.
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324.
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In B. Krishnapuram & M. Shah (Eds.), Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785.
Combes, P.-P., Duranton, G., & Gobillon, L. (2008). Spatial Wage Disparities: Sorting Matters!. Journal of Urban Economics, 63(2), 723–742. https://doi.org/10.1016/j.jue.2007.04.004.
Dykas, P., & Misiak, T. (2013). Determinanty przestrzennego zróżnicowania wybranych zmiennych makroekonomicznych. In M. Trojak & T. Tokarski (Eds.), Statystyczna analiza przestrzennego zróżnicowania rozwoju ekonomicznego i społecznego Polski (pp. 67–80). Wydawnictwo Uniwersytetu Jagiellońskiego.
Dykas, P., Misiak, T., & Tokarski, T. (2020). Determinants of spatial differentiation of labour markets in Ukraine. Przegląd Statystyczny. Statistical Review, 67(1), 33–50. https://doi.org/10.5604/01.3001.0014.1784.
Fisher, A., Rudin, C., & Dominici, F. (2019). All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of Machine Learning Research, 20, 1–18. https://jmlr.org/papers/volume20/18-760/18-760.pdf.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451.
Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916–954. https://doi.org/10.1214/07-AOAS148.
Greenwell, B. M., Bradley, C. B., & McCarthy, A. J. (2018). A simple and effective model-based variable importance measure. https://doi.org/10.48550/arXiv.1805.04755.
Hooker, G. (2004). Discovering additive structure in black box functions. In W. Kim, R. Kohavi, J. Gehrke & W. DuMouchel, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 575–580). Association for Computing Machinery. https://doi.org/10.1145/1014052.1014122.
Hooker, G. (2007). Generalized functional ANOVA diagnostics for high-dimensional functions of dependent variables. Journal of Computational and Graphical Statistics, 16(3), 709–732. https://doi.org/10.1198/106186007X237892.
Kaliski, S. F. (1964). The Relation Between Unemployment and the Rate of Change of Money Wages in Canada. International Economic Review, 5(1), 1–33. https://doi.org/10.2307/2525631.
Kapela, M., & Kwiatkowski, E. (2023). Regional Wage Differentiation and Qualitative Determinants of Economic Development: Evidence from Poland. Zeszyty Naukowe Uniwersytetu Ekonomicznego w Krakowie. Cracow Review of Economics and Management, (3), 47–65. https://doi.org/10.15678/ZNUEK.2023.1001.0303.
Kingdon, G. G., & Knight, J. (2006). How Flexible Are Wages in Response to Local Unemployment in South Africa?. ILR Review, 59(3), 471–495. https://doi.org/10.1177/001979390605900308.
Lundberg, S. M., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan & R. Garnett (Eds.), Advances in Neural Information Processing Systems (pp. 4765–4774). https://papers.nips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf.
Luśtyk, A., Połeć, A., & Voznyuk, I. (2024). Wage Differences in Poland at the County Level and their Determinants. Central European Economic Journal, 11(58), 447–460. https://doi.org/10.2478/ceej-2024-0028.
Machuca, J. A. L., & Cota, J. E. M. (2017). Salarios, desempleo y productividad laboral en la industria manufacturera mexicana. Ensayos Revista de Economía, 36(2), 185–228. https://ensayos.uanl.mx/index.php/ensayos/issue/view/10/17.
Masís, S. (2023). Interpretable machine learning with Python. Packt Publishing.
Molnar, C. (2020). Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. Leanpub. https://originalstatic.aminer.cn/misc/pdf/Molnar-interpretable-machine-learning_compressed.pdf.
Moretti, E. (2011). Local Labor Markets. In O. Ashenfelter & D. Card (Eds.), Handbook of Labor Economics (Vol. 4B, pp. 1237–1313). Elsevier. https://doi.org/10.1016/S0169-7218(11)02412-9.
Mullainathan, S., & Spiess, J. (2017). Machine Learning: An Applied Econometric Approach. Journal of Economic Perspectives, 31(2), 87–106. https://doi.org/10.1257/jep.31.2.87.
Phillips, A. W. (1958). The Relation Between Unemployment and the Rate of Change of Money Wage Rates in the United Kingdom, 1861–1957. Economica, 25(100), 283–299. https://doi.org/10.1111/j.1468-0335.1958.tb00003.x.
Przekota, G. (2016). Ocena poziomu i przyczyn zróżnicowania wynagrodzeń w Polsce. Roczniki Ekonomiczne Kujawsko-Pomorskiej Szkoły Wyższej w Bydgoszczy, (9), 386–403. https://kpsw.edu.pl/pobierz/wydawnictwo/re9/przekota2.pdf.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”. Explaining the Predictions of Any Classifier. KDD ‘16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144. https://doi.org/10.1145/2939672.2939778.
Shapley, L. S. (1953). A value for n-person games. In H. W. Kuhn, A. & W. Tucker (Eds.), Contributions to the Theory of Games (Vol. 2, pp. 307–317). Princeton University Press. https://doi.org/10.1515/9781400881970-018.
Štrumbelj, E., & Kononenko, I. (2010). An Efficient Explanation of Individual Classifications using Game Theory. Journal of Machine Learning Research, 11(1), 1–18. https://doi.org/10.1145/1756006.1756007.
Štrumbelj, E., & Kononenko, I. (2014). Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41(3), 647–665. https://doi.org/10.1007/s10115-013-0679-x.